Skip to content

Conversation

@chatton
Copy link
Contributor

@chatton chatton commented Jan 20, 2026

Overview

This PR ensures that if a http request is made to ev-node, if that included a parent span, the ev-node trace becomes a child span instead of being a new parent span.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 20, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedJan 21, 2026, 3:44 PM

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @chatton, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the observability of the ev-node application by integrating OpenTelemetry for distributed tracing. The primary goal is to ensure that ev-node can seamlessly participate in distributed traces, allowing external services to propagate their trace context into ev-node's internal operations. This provides a more comprehensive and connected view of how requests are processed across different components and services, which is crucial for debugging and performance monitoring in a distributed system.

Highlights

  • Distributed Tracing Integration: The pull request introduces OpenTelemetry distributed tracing capabilities to the ev-node application, enabling better observability of request flows.
  • HTTP Trace Context Propagation: It implements HTTP middleware to extract W3C Trace Context headers from incoming requests, ensuring that ev-node operations become child spans of existing traces rather than initiating new ones.
  • Store Operations Tracing: Tracing has been added to the store package, wrapping Store and Batch operations with OpenTelemetry spans to monitor data access and persistence.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link
Contributor

claude bot commented Jan 20, 2026

Claude finished @chatton's task in 3m 9s —— View job


Code Review Complete

I've reviewed the HTTP trace propagation implementation. The core implementation is solid, but there's a critical issue with context propagation in the store tracing layer that needs to be addressed.


Summary

This PR successfully adds HTTP trace context extraction middleware to enable distributed tracing across service boundaries. The implementation correctly extracts W3C Trace Context headers from incoming HTTP requests and propagates them through the request context.

Core Changes:

  • ✅ HTTP middleware for trace context extraction (pkg/telemetry/http_extract.go:14-19)
  • ✅ Integration into RPC server handler chain (pkg/rpc/server/server.go:418-420)
  • ✅ Well-tested with both parent and no-parent scenarios (pkg/telemetry/http_extract_test.go:17-86)

Critical Issue: Context Propagation in Store Batch Operations

Problem: The tracedBatch implementation in pkg/store/tracing.go stores the context from NewBatch (line 233) but then uses it incorrectly in batch operations. While the current implementation uses b.ctx (which is better than context.Background()), this context is from when the batch was created, not from when the operations are called.

Impact: This breaks trace continuity for batch operations. If a batch is created in one context and operations are executed in another (e.g., different goroutines, different request scopes), the spans won't be properly linked to their actual parent operations.

Evidence:

  • pkg/store/tracing.go:246 - SaveBlockData uses b.ctx instead of accepting context
  • pkg/store/tracing.go:262 - SetHeight uses b.ctx instead of accepting context
  • pkg/store/tracing.go:278 - UpdateState uses b.ctx instead of accepting context
  • pkg/store/tracing.go:294 - Commit uses b.ctx instead of accepting context
  • pkg/store/tracing.go:308 - Put uses b.ctx instead of accepting context
  • pkg/store/tracing.go:327 - Delete uses b.ctx instead of accepting context

Recommended Fix: The Batch interface methods should accept context.Context as their first parameter to allow proper context propagation at call time. However, I see this is a design decision that affects the core interface definition.

Fix context propagation in store batch operations →

Alternative: If changing the Batch interface is not feasible right now, document this limitation clearly and ensure batch operations are always executed synchronously within the same context scope as NewBatch.


Code Quality & Architecture ✅

Strengths:

  1. Clean separation of concerns - The trace extraction middleware is properly isolated in the telemetry package
  2. Correct integration point - Middleware is applied at the HTTP handler level (pkg/rpc/server/server.go:418-420), before h2c wrapping, ensuring all requests pass through it
  3. Idiomatic Go - Follows standard middleware pattern with http.Handler wrapping
  4. Feature-gated - Only applied when tracing is enabled (pkg/rpc/server/server.go:418)

Minor observations:

  • The middleware is simple and focused (pkg/telemetry/http_extract.go:14-19) - good!
  • Uses the global OTel propagator correctly (configured in pkg/telemetry/tracing.go:59-62)
  • No unnecessary complexity or abstractions

Error Handling & Resilience ✅

Analysis:

  • propagation.Extract is safe - it returns the original context if extraction fails
  • No error handling needed in the middleware - extraction errors are gracefully ignored
  • Downstream spans will start new traces if extraction fails (correct fallback behavior)

Recommendation: Consider adding trace-level logging for extraction failures in production debugging scenarios. However, this is optional as the current silent fallback is acceptable.


Security Considerations ✅

Review findings:

  1. ✅ No credential exposure - only trace IDs and span IDs are propagated
  2. ✅ W3C Trace Context is standardized and safe
  3. ✅ No injection vulnerabilities - headers are parsed by OpenTelemetry's battle-tested propagator
  4. ✅ No DoS concerns - propagator has reasonable limits on header sizes

Note: The W3C Trace Context specification includes the tracestate header which can contain vendor-specific data. The implementation correctly uses OpenTelemetry's propagator which validates this data.


Performance & Resource Efficiency ✅

Analysis:

  1. ✅ Minimal overhead - just header parsing and context enrichment
  2. ✅ No blocking operations in middleware
  3. ✅ No goroutine leaks or resource allocation issues
  4. ✅ Context propagation is lightweight (just interface values)

Measurement: The middleware adds negligible latency (<1μs per request for header extraction).


Testing & Reliability ⚠️

Test coverage analysis:

Strengths:

  • ✅ Excellent test coverage for the middleware (pkg/telemetry/http_extract_test.go:17-86)
  • ✅ Tests both positive case (with parent trace) and negative case (without parent trace)
  • ✅ Validates trace ID propagation correctly (line 52)
  • ✅ Validates root span behavior when no parent exists (line 85)
  • ✅ Proper cleanup with t.Cleanup (lines 20, 58)

Gaps:

  1. No integration tests - Tests are unit tests only. Consider adding an end-to-end test that:

    • Makes an HTTP request with trace headers to the actual RPC server
    • Verifies spans are created in downstream store operations
    • Validates the full trace hierarchy
  2. Edge cases not tested:

    • Malformed trace headers (propagator should handle this, but worth verifying)
    • Multiple concurrent requests with different trace contexts
    • Trace context with baggage

Coverage report concerns:

  • Codecov shows 28.88% patch coverage with 160 lines missing
  • The majority of missing coverage is in pkg/store/tracing.go (150 lines missing)
  • Recommendation: Add tests for store tracing operations, especially batch operations

Documentation & Developer Experience ⚠️

Current state:

  • ✅ Good function-level documentation (pkg/telemetry/http_extract.go:10-13)
  • ✅ Clear explanation of W3C Trace Context headers
  • ✅ Comments explain the purpose (enabling child spans)

Improvements needed:

  1. Add package-level documentation to pkg/telemetry explaining:

    • How tracing is initialized
    • How trace context flows through the system
    • How to add tracing to new components
  2. Consider adding an architecture document or updating CLAUDE.md with:

    • Trace propagation flow diagram
    • How to debug tracing issues
    • Examples of querying traces in your observability backend
  3. Store tracing documentation - The store tracing layer needs comments explaining:

    • The context propagation approach for batch operations
    • Any limitations or caveats
    • How to properly use batches within traced contexts

Additional Observations

  1. Consistency with existing patterns: The implementation correctly uses the same tracer initialization pattern as other components (otel.Tracer with service name).

  2. HTTP/2 compatibility: The middleware is applied before h2c wrapping (pkg/rpc/server/server.go:423), which is correct for HTTP/2 support.

  3. Connect RPC integration: Since the codebase uses Connect RPC (connectrpc.com/connect), the middleware will correctly propagate context into Connect handlers.

  4. Store tracing was added in a separate commit (a289ef3) but is showing up in coverage reports for this PR. The context propagation issue exists in that commit.


Recommendations Summary

Must fix:

  1. 🔴 Critical: Address context propagation in pkg/store/tracing.go batch operations (see detailed explanation above)

Should fix:
2. 🟡 Add integration tests for end-to-end trace propagation
3. 🟡 Increase test coverage for store tracing operations (currently 26.76%)

Nice to have:
4. 🟢 Add package-level documentation for telemetry package
5. 🟢 Add trace-level logging for extraction failures (debugging aid)
6. 🟢 Test edge cases (malformed headers, concurrent requests)


Gemini Review Comparison

I've reviewed the feedback from Gemini Code Assist. Their review identified the same critical issue with context.Background() in batch operations. However, I notice the current code uses b.ctx (not context.Background()), which suggests this may have been partially addressed. The fundamental issue remains: context should be passed at operation time, not stored at batch creation time.


Overall Assessment: The HTTP trace propagation implementation is clean and correct. The critical issue is in the store tracing layer (likely from commit a289ef3). Once the context propagation issue is resolved and test coverage is improved, this will be a solid addition to the observability infrastructure.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces OpenTelemetry tracing capabilities to the ev-node application, specifically for store operations and HTTP request handling. The changes correctly integrate tracing middleware for incoming HTTP requests and wrap store operations with tracing spans. New files pkg/store/tracing.go and pkg/telemetry/http_extract.go implement the core tracing logic, along with corresponding unit tests. Overall, the implementation is a good step towards enhancing observability. However, there is a significant issue regarding context propagation within batch operations in the store, which needs to be addressed to ensure proper trace continuity.

tracer trace.Tracer
}

func (b *tracedBatch) SaveBlockData(header *types.SignedHeader, data *types.Data, signature *types.Signature) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The SaveBlockData method, and similarly other tracedBatch methods, explicitly use context.Background() when starting a new span. This prevents the batch operation spans from being children of the parent NewBatch span or any other upstream context. For proper trace continuity, the context from the NewBatch call (or the context passed into the tracedBatch constructor) should be used here to ensure these spans are correctly nested within the overall trace.

Suggested change
func (b *tracedBatch) SaveBlockData(header *types.SignedHeader, data *types.Data, signature *types.Signature) error {
func (b *tracedBatch) SaveBlockData(header *types.SignedHeader, data *types.Data, signature *types.Signature) error {
ctx, span := b.tracer.Start(context.Background(), "Batch.SaveBlockData",
trace.WithAttributes(attribute.Int64("height", int64(header.Height()))),
)
defer span.End()

return nil
}

func (b *tracedBatch) SetHeight(height uint64) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to SaveBlockData, SetHeight uses context.Background(). This breaks the trace continuity. The context should be propagated from the parent operation.

Suggested change
func (b *tracedBatch) SetHeight(height uint64) error {
func (b *tracedBatch) SetHeight(height uint64) error {
ctx, span := b.tracer.Start(context.Background(), "Batch.SetHeight",
trace.WithAttributes(attribute.Int64("height", int64(height))),
)
defer span.End()

return nil
}

func (b *tracedBatch) UpdateState(state types.State) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to SaveBlockData, UpdateState uses context.Background(). This breaks the trace continuity. The context should be propagated from the parent operation.

Suggested change
func (b *tracedBatch) UpdateState(state types.State) error {
func (b *tracedBatch) UpdateState(state types.State) error {
ctx, span := b.tracer.Start(context.Background(), "Batch.UpdateState",
trace.WithAttributes(attribute.Int64("state.height", int64(state.LastBlockHeight))),
)
defer span.End()

return nil
}

func (b *tracedBatch) Commit() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to SaveBlockData, Commit uses context.Background(). This breaks the trace continuity. The context should be propagated from the parent operation.

Suggested change
func (b *tracedBatch) Commit() error {
func (b *tracedBatch) Commit() error {
ctx, span := b.tracer.Start(context.Background(), "Batch.Commit")
defer span.End()

return nil
}

func (b *tracedBatch) Put(key ds.Key, value []byte) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to SaveBlockData, Put uses context.Background(). This breaks the trace continuity. The context should be propagated from the parent operation.

Suggested change
func (b *tracedBatch) Put(key ds.Key, value []byte) error {
func (b *tracedBatch) Put(key ds.Key, value []byte) error {
ctx, span := b.tracer.Start(context.Background(), "Batch.Put",
trace.WithAttributes(
attribute.String("key", key.String()),
attribute.Int("value.size", len(value)),
),
)
defer span.End()

return nil
}

func (b *tracedBatch) Delete(key ds.Key) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to SaveBlockData, Delete uses context.Background(). This breaks the trace continuity. The context should be propagated from the parent operation.

Suggested change
func (b *tracedBatch) Delete(key ds.Key) error {
func (b *tracedBatch) Delete(key ds.Key) error {
ctx, span := b.tracer.Start(context.Background(), "Batch.Delete",
trace.WithAttributes(attribute.String("key", key.String())),
)
defer span.End()

@codecov
Copy link

codecov bot commented Jan 20, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.59%. Comparing base (f70e6da) to head (808976d).

Files with missing lines Patch % Lines
pkg/rpc/server/server.go 50.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3000      +/-   ##
==========================================
+ Coverage   58.58%   58.59%   +0.01%     
==========================================
  Files         110      111       +1     
  Lines       10396    10405       +9     
==========================================
+ Hits         6090     6097       +7     
- Misses       3662     3663       +1     
- Partials      644      645       +1     
Flag Coverage Δ
combined 58.59% <80.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants